Task-based programming in COMPSs to converge from HPC to big data
نویسندگان
چکیده
Task-based programming has proven to be a suitable model for high-performance computing (HPC) applications. Different implementations have been good demonstrators of this fact, and have promoted the acceptance of task-based programming in the OpenMP standard. Furthermore, in recent years, Apache Spark has gained wide popularity in business and research environments as a programming model for addressing emerging Big-Data problems. COMP Superscalar (COMPSs) is a task-based environment that tackles distributed computing (including Clouds), and is a good alternative for a task-based programming model for Big Data applications. This paper describes why we consider that task-based programming models are a good approach for Big Data applications. The paper includes a comparison of Spark and COMPSs in terms of architecture, programming model and performance. It focuses on the differences that both frameworks have in structural terms, on their programmability interface, and in terms of their efficiency by means of three widely known benchmarking kernels: Wordcount, Kmeans and Terasort. These kernels enable the evaluation of the more important functionalities of both programming models and analyse different workflows and conditions. The main results achieved from this comparison are: (1) COMPSs is able to extract the inherent parallelism from the user code with minimal coding effort as opposed to Spark, which requires the existing algorithms to be adapted and rewritten by explicitly using their pre-defined functions; (2) it is an improvement in terms of performance when compared with Spark, and (3) COMPSs has shown to scale better than Spark in most cases. Finally, we discuss the advantages and disadvantages of both frameworks, highlighting the differences that make them unique, thereby helping to choose the right framework for each particular objective. Keywords-Programming models; Distributed computing; Framework comparison; Big Data programming;
منابع مشابه
CloudFlow: A data-aware programming model for cloud workflow applications on modern HPC systems
Traditional High-Performance Computing (HPC) based big-data applications are usually constrained by having to move large amount of data to compute facilities for real-time processing purpose. Modern HPC systems, represented by High-Throughput Computing (HTC) andMany-Task Computing (MTC) platforms, on the other hand, intend to achieve the long-held dream of moving compute to data instead. This k...
متن کاملEnabling Cloud Interoperability with COMPSs
The advent of Cloud computing has given to researchers the ability to access resources that satisfy their growing needs, which could not be satisfied by traditional computing resources such as PCs and locally managed clusters. On the other side, such ability, has opened new challenges for the execution of their computational work and the managing of massive amounts of data into resources provid...
متن کاملCOMPSs in the VENUS-C Platform: enabling e-Science applications on the Cloud
COMP Superscalar (COMPSs) is a programming framework that aims to provide an easy-to-use programming model and a runtime to enable the development of applications for distributed environments. Thanks to its modular architecture COMPSs can use a wide range of computational infrastructures providing a uniform interface for job submission and file transfer operations through adapters for different...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملB Uilding a P Rivate Hpc C Loud for C Ompute and D Ata - I Ntensive a Pplications
Traditional HPC (High Performance Computing) clusters are best suited for well-formed calculations. The orderly batch-oriented HPC cluster offers maximal potential for performance per application, but limits resource efficiency and user flexibility. An HPC cloud can host multiple virtual HPC clusters, giving the scientists unprecedented flexibility for research and development. With the proper ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IJHPCA
دوره 32 شماره
صفحات -
تاریخ انتشار 2018